Using Prefix-Trees for Efficiently Computing Set Joins
نویسندگان
چکیده
Joins on set-valued attributes (set joins) have numerous database applications. In this paper we propose PRETTI (PREfix Tree based seT joIn) – a suite of set join algorithms for containment, overlap and equality join predicates. Our algorithms use prefix trees and inverted indices. These structures are constructed on-the-fly if they are not already precomputed. This feature makes our algorithms usable for relations without indices and when joining intermediate results during join queries with more than two relations. Another feature of our algorithms is that results are output continuously during their execution and not just at the end. Experiments on real life datasets show that the total execution time of our algorithms is significantly less than that of previous approaches, even when the indices required by our algorithms are not precomputed.
منابع مشابه
Similarity Joins in Relational Database Systems
State-of-the-art database systems manage and process a variety of complex objects, including strings and trees. For such objects equality comparisons are often not meaningful and must be replaced by similarity comparisons. is book describes the concepts and techniques to incorporate similarity into database systems. We start out by discussing the properties of strings and trees, and identify t...
متن کاملSet containment joins using two prefix trees (Exposé)
A common example (see Mamoulis, 2003) of a set containment problem is the matching of people, people’s skills, jobs and job skills. The way to answer “Which people are matching which job?” depends on the used data structures. In relational databases set attributes are usually modeled by normalized mapping using a separate relation for every single set attribute. People and their skills are then...
متن کاملApplying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins
The pipelined execution of multijoin queries in a multiprocessor-based database system is explored in this paper. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join for processing. The execut ion of a query is usually denoted by a query execution tree. To improve the execution of pipelined ...
متن کاملPlug&Join: An easy-to-use Generic Algorithm for Efficiently Processing Equi and Non-Equi Joins
This paper presents Plug&Join, a new generic algorithm for efficiently processing a broad class of different types of joins in an extensible database system. Plug&Join is not only designed to support equi joins, temporal joins, spatial joins, subset joins and other types of joins, but in contrast to previous algorithms it can be easily customized and it allows efficient processing of new types ...
متن کاملPEL: Position-Enhanced Length Filter for Set Similarity Joins
Set similarity joins compute all pairs of similar sets from two collections of sets. Set similarity joins are typically implemented in a filter-verify framework: a filter generates candidate pairs, possibly including false positives, which must be verified to produce the final join result. Good filters produce a small number of false positives, while they reduce the time they spend on hopeless ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005